# Large-scale Corpus Training

Roberta Large Japanese
A large Japanese RoBERTa model pretrained on Japanese Wikipedia and the Japanese portion of CC-100, suitable for Japanese natural language processing tasks.
Large Language Model Transformers Japanese
R
nlp-waseda
227
23
Opus Mt Tc Big Cat Oci Spa En
This is a neural machine translation model for translating from Catalan, Occitan, and Spanish to English, part of the OPUS-MT project.
Machine Translation Transformers Supports Multiple Languages
O
Helsinki-NLP
24
2
Opus Mt Tc Big En Ar
This is a neural machine translation model for translating from English to Arabic, part of the OPUS-MT project, supporting multi-target language translation.
Machine Translation Transformers Supports Multiple Languages
O
Helsinki-NLP
4,562
23
Icebert Xlmr Ic3
An Icelandic masked language model based on the RoBERTa-base architecture, fine-tuned from xlm-roberta-base
Large Language Model Transformers Other
I
mideind
24
0
Icebert Ic3
Icelandic masked language model trained on RoBERTa-base architecture using the fairseq framework
Large Language Model Transformers Other
I
mideind
16
0
Berdou 500k
Portuguese BERT model fine-tuned for MLM (Masked Language Modeling) on 500,000 instances from the Brazilian Federal Official Gazette, based on the Bertimbau-Base model
Large Language Model Transformers
B
flavio-nakasato
16
0
Gerpt2 Large
MIT
GerPT2 is the large-scale version of the German GPT2, trained on the CC-100 corpus and German Wikipedia, excelling in German text generation tasks.
Large Language Model German
G
benjamin
75
9
Bert Base Qarib60 1970k
QARiB is a BERT model based on Arabic and its dialects, trained on approximately 420 million tweets and 180 million text sentences, supporting various Arabic NLP tasks.
Large Language Model Arabic
B
ahmedabdelali
41
1
Bert Base Qarib60 1790k
QARiB is an Arabic and dialect BERT model trained on approximately 420 million tweets and 180 million text sentences, supporting various downstream NLP tasks.
Large Language Model Arabic
B
ahmedabdelali
16
2
Indot5 Base
T5 (Text-to-Text Transfer Transformer) base model pretrained on Indonesian mC4 dataset, requires fine-tuning for use
Large Language Model Transformers Other
I
Wikidepia
635
1
Rubert Base Cased Conversational
Russian conversational model trained on OpenSubtitles, Dirty, Pikabu, and Taiga corpus social media sections
Large Language Model Other
R
DeepPavlov
165.49k
20
Sroberta F
Apache-2.0
RoBERTa model trained on a 43GB dataset of Croatian and Serbian languages, supporting masked language modeling tasks.
Large Language Model Transformers Other
S
Andrija
51
2
Est Roberta
Est-RoBERTa is a monolingual Estonian BERT-like model based on the RoBERTa architecture, trained on 2.51 billion Estonian vocabulary tokens.
Large Language Model Transformers Other
E
EMBEDDIA
155
4
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase